Introduction

Video games have emerged as one of society’s most beloved sources for entertainment over time. As a unique medium with interactive storytelling elements as well as skill tests for players that come across multiple forms and features. Gaining insight from computer-generated designs is crucial both for innovations during development phases and to adjust games in line with public demand. By examining different aspects including genre choice, factors leading up to creation or even how gamers respond after launch date - can help developers make key adjustments, direct academic research, social commentary, or guide players on choosing worthwhile purchases.

Presentation and description of the problem

Our study focuses on analyzing a video games dataset focused on the Most Popular Games between the years 1980 - 2023 and the analysis focuses on Genre Analysis, Developer Analysis, and Player Engagement Analysis. By doing so, we hope to reveal essential information about what drives a video games’ longevity and success. Our primary objective is to equip gaming professionals and developers with insight into the underlying principles that discern prosperous titles from fleeting ones.

We believe that analyzing a broad spectrum of popular video games across time provides us with valuable knowledge about crucial facets such as genre selection criteria, developer contributions towards these titles’ success, as well as factors affecting player engagement. The intelligence obtained from these assessments should help developers make informed decisions regarding both production as well as promotions related strategies based on player preferences insights gained by analyzing such data sets. Consequently, helping professionals understand trends reflected within gaming markets, enabling them to stay ahead decisively. In summary, our goal for this analysis is focused on resolving complex questions revolving video game popularity and successful game development.

Presentation of the Data

Importing the data and visualising it

dataset <- read.csv("D:/Faks/Year 3/Data Programming/Project/games.csv")[-1253, ] 
# Here I remove row 1253 because it is an entry for a game that hasn't been released yet and it also has the highest review even though it hasn't been released
datatable(dataset, rownames = T, filter = "top", caption = "Games Data Set", options = list(searching = F, pageLength = 10, lengthMenu = c(5, 10, 15, 20), scrollX = T,  autoWidth = T, columnDefs = list(
      list(targets = c(9, 10), visible = FALSE)
    )))

In this data table I have just removed the columns Reviews and Summary because they are big chunks of text that extend the size of the table

Contents description

colnames(dataset)
##  [1] "X"                 "Title"             "Release.Date"     
##  [4] "Team"              "Rating"            "Times.Listed"     
##  [7] "Number.of.Reviews" "Genres"            "Summary"          
## [10] "Reviews"           "Plays"             "Playing"          
## [13] "Backlogs"          "Wishlist"

Meaning of column names

  1. “X”: Index of the row
  2. “Title”: Title of the game
  3. “Release.Date”: Date of release of the game’s first version
  4. “Team”: Game developer team
  5. “Rating”: Average rating
  6. “Times.Listed”: Number of users who listed this game
  7. “Number.of.Reviews”: Number of reviews received from the users
  8. “Genres”: All genres pertaining to a specified game
  9. “Summary”: Summary provided by the team
  10. “Reviews”: User reviews
  11. “Plays”: Number of users that have played the game before
  12. “Playing”: Number of current users who are playing the game.
  13. “Backlogs”: Number of users who have access but haven’t started with the game yet
  14. “Wishlist”: Number of users who wish to play the game

Overview of Data

Data-type info

With the function str(dataset) we display the internal structure of the data set

str(dataset)
## 'data.frame':    1511 obs. of  14 variables:
##  $ X                : int  0 1 2 3 4 5 6 7 8 9 ...
##  $ Title            : chr  "Elden Ring" "Hades" "The Legend of Zelda: Breath of the Wild" "Undertale" ...
##  $ Release.Date     : chr  "Feb 25, 2022" "Dec 10, 2019" "Mar 03, 2017" "Sep 15, 2015" ...
##  $ Team             : chr  "['Bandai Namco Entertainment', 'FromSoftware']" "['Supergiant Games']" "['Nintendo', 'Nintendo EPD Production Group No. 3']" "['tobyfox', '8-4']" ...
##  $ Rating           : num  4.5 4.3 4.4 4.2 4.4 4.3 4.2 4.3 3 4.3 ...
##  $ Times.Listed     : chr  "3.9K" "2.9K" "4.3K" "3.5K" ...
##  $ Number.of.Reviews: chr  "3.9K" "2.9K" "4.3K" "3.5K" ...
##  $ Genres           : chr  "['Adventure', 'RPG']" "['Adventure', 'Brawler', 'Indie', 'RPG']" "['Adventure', 'RPG']" "['Adventure', 'Indie', 'RPG', 'Turn Based Strategy']" ...
##  $ Summary          : chr  "Elden Ring is a fantasy, action and open world game with RPG elements such as stats, weapons and spells. Rise, "| __truncated__ "A rogue-lite hack and slash dungeon crawler in which Zagreus, son of Hades the Greek god of the dead, attempts "| __truncated__ "The Legend of Zelda: Breath of the Wild is the first 3D open-world game in the Zelda series. Link can travel an"| __truncated__ "A small child falls into the Underground, where monsters have long been banished by humans and are hunting ever"| __truncated__ ...
##  $ Reviews          : chr  "[\"The first playthrough of elden ring is one of the best eperiences gaming can offer you but after youve explo"| __truncated__ "['convinced this is a roguelike for people who do not like the genre. The art is technically good but the aesth"| __truncated__ "['This game is the game (that is not CS:GO) that I have played the most ever. I have played this game for 400 h"| __truncated__ "['soundtrack is tied for #1 with nier automata.  a super charming story and characters which have become iconic"| __truncated__ ...
##  $ Plays            : chr  "17K" "21K" "30K" "28K" ...
##  $ Playing          : chr  "3.8K" "3.2K" "2.5K" "679" ...
##  $ Backlogs         : chr  "4.6K" "6.3K" "5K" "4.9K" ...
##  $ Wishlist         : chr  "4.8K" "3.6K" "2.6K" "1.8K" ...

Summarization

With the summary(dataset) function we do statistical analysis on our data.

summary(dataset)
##        X             Title           Release.Date           Team          
##  Min.   :   0.0   Length:1511        Length:1511        Length:1511       
##  1st Qu.: 377.5   Class :character   Class :character   Class :character  
##  Median : 755.0   Mode  :character   Mode  :character   Mode  :character  
##  Mean   : 755.2                                                           
##  3rd Qu.:1132.5                                                           
##  Max.   :1511.0                                                           
##                                                                           
##      Rating      Times.Listed       Number.of.Reviews     Genres         
##  Min.   :0.700   Length:1511        Length:1511        Length:1511       
##  1st Qu.:3.400   Class :character   Class :character   Class :character  
##  Median :3.800   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :3.719                                                           
##  3rd Qu.:4.100                                                           
##  Max.   :4.600                                                           
##  NA's   :13                                                              
##    Summary            Reviews             Plays             Playing         
##  Length:1511        Length:1511        Length:1511        Length:1511       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##    Backlogs           Wishlist        
##  Length:1511        Length:1511       
##  Class :character   Class :character  
##  Mode  :character   Mode  :character  
##                                       
##                                       
##                                       
## 

Exploratory data analysis and Visualization

Genre Analysis

The first analysis we are going to conduct is Genre Analysis, in this analysis we will focus on which games are most popular amongst gamers in order to understand which game genres are more likely to succeed and attract a larger audience. By examining the average ratings of different genres, we can understand the preferences and tastes of gamers. This information will be valuable for Game developers and Publishers, as it can help guide their decision-making process when it comes to game development.

# Create a new dataset for genre analysis
genre_dataset <- dataset

# Converting Genres column to characters
genre_dataset <- genre_dataset %>%
  mutate(Genres = as.character(Genres))

# Splitting the genre column into separate genres
genre_dataset <- genre_dataset %>%
  mutate(Genres = str_extract_all(Genres, "'(.*?)'")) %>%
  unnest(Genres)

# Calculating average rating per genre
genre_ratings <- genre_dataset %>%
  group_by(Genres) %>%
  summarize(AverageRating = mean(Rating, na.rm = TRUE))  

# Creating a column chart with genre names on the x-axis and ratings on the y-axis
ggplot(data = genre_ratings, aes(x = reorder(Genres, AverageRating), y = AverageRating)) +
  geom_col(fill = "steelblue", width = 0.7) +
  labs(x = "Genre", y = "Average Rating", title = "Average Rating by Genre") +
  scale_y_continuous(breaks = seq(0, ceiling(max(genre_ratings$AverageRating)), by = 0.5)) +
  theme_bw() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5, size = 8))

As we can view from the graph the most popular Game Genres amongst gamers is RPG, Turn Based Strategy and Visual Novel, of which Visaul Novel is the only Genre with a rating above 4.0

Developer Analysis

This analysis is going to focus on the best Developer Companies, with this analysis we hope to uncover which video game developer companies are the most prominent in the video game industry. By calculating which video game companies are most likely to create a successful game and have their games be popular amongst gamers. This information will be valuable for Investors, as it can help guide their decision in which companies to invest.

# Creating a new dataset for team analysis
team_dataset <- dataset

# Converting Team column to character type
team_dataset <- team_dataset %>%
  mutate(Team = as.character(Team))

# Splitting the Team column into separate teams
team_dataset <- team_dataset %>%
  mutate(Team = str_extract_all(Team, "'(.*?)'")) %>%
  unnest(Team)

# Counting the number of games developed per team
team_counts <- team_dataset %>%
  group_by(Team) %>%
  summarize(GamesDeveloped = n())

# Selecting the top 5 teams with the most games developed
top_teams <- team_counts %>%
  top_n(5, wt = GamesDeveloped)  # Select top 5 teams based on games developed

# Defining a bright color palette
colors <- brewer.pal(length(top_teams$GamesDeveloped), "Set1")

# Creating a pie chart to visualize the distribution of games developed among the top 5 teams
ggplot(data = top_teams, aes(x = "", y = GamesDeveloped, fill = Team)) +
  geom_bar(stat = "identity", width = 1) +
  coord_polar("y", start = 0) +
  labs(fill = "Team", x = NULL, y = NULL, title = "Distribution of Games developed\namongs Game Development Companies") +
  theme_void() +
  theme(legend.position = "bottom") +
  geom_text(aes(label = paste(GamesDeveloped)), position = position_stack(vjust = 0.5)) +
  scale_fill_manual(values = colors)

As we can view from the graph the most prominent Game Development Companies are Capcom, Electronic Arts, Nintendo, Sega and Square Enix, of which Nintendo is by far the most successful company as it has produced 245 of the games on this list.

Player Engagement Analysis

This analysis focuses on Player Engagement with highly rated video games. The purpose of our analysis is to compare how many players have played a given game against how many individuals own that same title but haven’t started playing yet. We will focus on the top 5 games based on their ratings. With this analysis, we hope to better understand player engagement and ownership patterns in the top-rated game. This analysis can benefit various stakeholders in the gaming industry, including game developers, publishers, and marketers.

# Creating a new dataset for analysis
analysis_dataset <- dataset

# Extracting necessary columns for the analysis
analysis_data <- analysis_dataset %>%
  select(Title, Plays, Playing, Backlogs, Rating) %>%
  mutate(
    Plays = parse_number(Plays) * ifelse(grepl("K$", Plays), 1000, 1),
    Playing = parse_number(Playing) * ifelse(grepl("K$", Playing), 1000, 1),
    Backlogs = parse_number(Backlogs) * ifelse(grepl("K$", Backlogs), 1000, 1)
  ) %>%
  arrange(desc(Rating)) %>%
  head(5)  # Select top 5 games based on rating

# Calculating the total number of players who have played the game
analysis_data <- analysis_data %>%
  mutate(TotalPlayers = Plays + Playing)

# Calculating the total number of copies
analysis_data <- analysis_data %>%
  mutate(TotalCopies = TotalPlayers + Backlogs)

# Calculating the percentage of non-players
analysis_data <- analysis_data %>%
  mutate(NonPlayersPercentage = ceiling((Backlogs / TotalCopies) * 100))

# Creating a column chart to compare the number of players who have played the game and those who own it but haven't started
column_chart <- ggplot(data = analysis_data) +
  geom_col(aes(x = as.numeric(factor(Title)), y = TotalPlayers, fill = "Total Players"), width = 0.4, position = position_dodge(width = 0.8)) +
  geom_col(aes(x = as.numeric(factor(Title)) + 0.4, y = Backlogs, fill = "Non-Players"), width = 0.4, position = position_dodge(width = 0.8)) +
  geom_text(aes(x = as.numeric(factor(Title)) + 0.2, y = TotalPlayers, label = TotalPlayers), vjust = 1.2, hjust = 1.3) +
  geom_text(aes(x = as.numeric(factor(Title)) + 0.6, y = Backlogs, label = Backlogs), vjust = 1.1, hjust = 1.3) +
  scale_x_continuous(breaks = as.numeric(factor(analysis_data$Title)), labels = analysis_data$Title) +
  labs(x = "Game", y = "Number of Players", title = "Comparison of Players Played vs Non-Players \n (Top 5 Games by Rating)") +
  scale_fill_manual(values = c("Total Players" = "steelblue", "Non-Players" = "orange")) +
  theme_bw() +
  coord_flip() +
  scale_y_continuous(labels = scales::comma) +
  guides(fill = guide_legend(title = "Status")) +
  theme(legend.position = "top", axis.text.x = element_text(angle = 45, hjust = 1))

print(column_chart)

# Defining a bright color palette
colors <- brewer.pal(length(analysis_data$NonPlayersPercentage), "Set1")


# Creating a pie chart to visualize the distribution of non-player percentages
pie_chart <- ggplot(data = analysis_data, aes(x = "", y = NonPlayersPercentage, fill = Title)) +
  geom_bar(stat = "identity", width = 1) +
  coord_polar("y", start = 0) +
  labs(x = NULL, y = NULL, title = "Percentage of Non-Players") +
  theme_void() +
  theme(legend.position = "right") +
  geom_text(aes(label = paste((NonPlayersPercentage), "%")), position = position_stack(vjust = 0.5)) +
  scale_fill_manual(values = colors) 

# Displaying the pie chart of non-player percentages
print(pie_chart)

From these graphs we can conclude that the player/non-players distribution is Quite Different in these games, even though the ratings for these games are quite similar. For instance the game “Outer Wilds” has 8361 players, which the highest out of these 5 games, it also has 4800 non-players, which means that 36% of all people who own “Outer Wilds” don’t play it. Similar results are show for “Disco Elysium: The Final Cut”, which has 40% non-play rate. On the other hand less popular games like “Bloodborn: The Old Hunters” has a shocking 17% non-play rate, which means most of the people who bought the game did infact play it and rated it highly.

Conclusion

In our exploratory data analysis, we gained valuable information about the gaming industry. We identified popular game genres such as RPG, Turn Based Strategy, and Visual Novel. Successful game developers such as Capcom, Electronic Arts, Nintendo, Sega, and Square Enix. Additionally, we analyzed Player engagement and Ownership patterns, revealing variations in non-play rates among highly rated games.

These findings offer valuable information for game developers, publishers, investors, and marketers. They can use these findings to make informed decisions about game development, investment opportunities, and marketing strategies. Overall, this analysis contributes to a better understanding of the gaming industry and its dynamics.

Bibliography

This study was influenced by an article that investigates engagement strategies in popular video games (Dickey 2005)

References

Dickey, Michele D. 2005. “Engaging by Design: How Engagement Strategies in Popular Computer and Video Games Can Inform Instructional Design.” Educational Technology Research and Development 53 (2): 67–83.